Search CORE

17 research outputs found

Position Heaps for Parameterized Strings

Author: Diptarama Diptarama
Katsura Takashi
Narisawa Kazuyuki
Otomo Yuhei
Shinohara Ayumi
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 28th Annual Symposium on Combinatorial Pattern Matching (CPM 2017)
Publication date: 01/01/2017
Field of study

We propose a new indexing structure for parameterized strings, called parameterized position heap. Parameterized position heap is applicable for parameterized pattern matching problem, where the pattern matches a substring of the text if there exists a bijective mapping from the symbols of the pattern to the symbols of the substring. We propose an online construction algorithm of parameterized position heap of a text and show that our algorithm runs in linear time with respect to the text size. We also show that by using parameterized position heap, we can find all occurrences of a pattern in the text in linear time with respect to the product of the pattern size and the alphabet size

arXiv.org e-Print Archive

Dagstuhl Research Online Publication Server

<Abstract of annual report>The Mechanism of Placental Alkaline Phosphatase Induction in Vitro.

Author: JOSE L. MILLAN
KAZUYUKI HIRANO
PAUL K. NAKANE
RIHACHI IIZUKA
SHIRO NOZAWA
SONOKO NARISAWA
TAKEHIKO KOHJI
TOSHIO FUKUSAWA
Publication venue
Publication date: 30/06/1990
Field of study

Gif University: Repository / 岐阜薬大学術リポジトリ

M.: Unsupervised spam detection based on string alienness measures

Author: Kazuyuki Narisawa
Kohei Hatano
Publication venue
Publication date
Field of study

We propose an unsupervised method for detecting spam documents from Web page data, based on equivalence relations on strings. We propose 3 measures for quantifying the alienness (i.e. how different it is from others) of substring equivalence classes within a given set of strings. A document is then classified as spam if it contains a characteristic equivalence class as a substring. The proposed method is unsupervised, independent of language, and is very efficient. Computational experiments conducted on data collected from Japanese web forums show fairly good results

CiteSeerX

Filtering Multi-set Tree: Data Structure for Flexible Matching Using Multi-track Data

Author: Ayumi SHINOHARA
Hiroyuki OTA
Kazuyuki NARISAWA
Takashi KATSURA
Publication venue: 'Graduate School of Information Sciences, Tohoku University'
Publication date: 01/01/2015
Field of study

Crossref

Filtering Multi-set Tree : Data Structure for Flexible Matching Using Multi-track Data

Author: KATSURA Takashi
NARISAWA Kazuyuki
OTA Hiroyuki
SHINOHARA Ayumi
Publication venue: 'Graduate School of Information Sciences, Tohoku University'
Publication date: 20/03/2015
Field of study

Special Section: Nowcast and Forecast of Road Traffic by Data Fusion of Various Sensing Dat

Tohoku University Repository (TOUR) / 東北大学機関リポジトリ

Detecting blog spams using the vocabulary size of all substrings in their copies

Author: Daisuke Ikeda
Kazuyuki Narisawa
Masayuki Takeda
Yasuhiro Yamada
Publication venue
Publication date
Field of study

This paper addresses the problem of detecting blog spams, which are unsolicited messages on blog sites, among blog entries. Unlike a spam mail, a typical blog spam is produced to increase the PageRank for the spammer’s Web sites, and so many copies of the blog spam are necessary and all of them contain URLs of the sites. Therefore the number of the copies, we call it the frequency, seems to be a good key to find this type of blog spams. The frequency is not, however, sufficient for detection algorithms which detect an entry as a blog spam if the frequency is greater than some threshold value, because of the following reasons: it is very difficult to collect Web pages including all copies of a blog entry; therefore an input data contains only a few copies of the entry whose number may be smaller than the predefined threshold; and thus a frequency based spam detection algorith

CiteSeerX

Filtering Multi-set Tree : Data Structure for Flexible Matching Using Multi-track Data

Author: KATSURA Takashi
NARISAWA Kazuyuki
OTA Hiroyuki
SHINOHARA Ayumi
Publication venue: Graduate School of Information Sciences, Tohoku University
Publication date
Field of study

Institutional Repositories DataBase (IRDB)